Skip to content

Support Nessie Catalog in Iceberg connector#11701

Merged
ebyhr merged 1 commit intotrinodb:masterfrom
nastra:add-nessie-support
May 29, 2023
Merged

Support Nessie Catalog in Iceberg connector#11701
ebyhr merged 1 commit intotrinodb:masterfrom
nastra:add-nessie-support

Conversation

@nastra
Copy link
Copy Markdown
Contributor

@nastra nastra commented Mar 29, 2022

This PR integrates the (Nessie catalog functionality)[https://github.com/apache/iceberg/tree/master/nessie/src/main/java/org/apache/iceberg/nessie] to the Iceberg connector. It adds the following new things:

  • a new CatalogType called NESSIE
  • an IcebergNessieCatalogModule that sets up all necessary dependencies (including a client to connect to the Nessie server)
  • a NessieConfig that includes configuration settings required for Nessie
  • TrinoNessieCatalog + NessieIcebergTableOperations that implement the main behavior of the catalog
  • some unit and integration tests that verify that the Iceberg connector works with Nessie (note that the integration test requires a Nessie server to be started, which is being done via the nessie-apprunner-maven-pluginprior to the integration-test phase)

@cla-bot cla-bot Bot added the cla-signed label Mar 29, 2022
@nastra nastra requested a review from findepi March 29, 2022 08:49
@nastra nastra force-pushed the add-nessie-support branch from 66fed2e to 191715e Compare March 29, 2022 11:39
@nastra
Copy link
Copy Markdown
Contributor Author

nastra commented Mar 29, 2022

@findepi looks like one of the CI jobs timed out. would it be possible to restart just this one job?

@nastra nastra closed this Mar 30, 2022
@nastra nastra reopened this Mar 30, 2022
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Mar 30, 2022

@nastra GitHub Actions recently introduced "Re-run failed jobs". The feature might be not yet stable though. By the way, we will request to push an empty commit for triggering CI if necessary.

@nastra
Copy link
Copy Markdown
Contributor Author

nastra commented Mar 31, 2022

@ebyhr I don't have permissions unfortunately to re-run the failed job myself, that's why I closeed & reopened the PR, which triggers a full CI run again (same as pushing)

@findinpath
Copy link
Copy Markdown
Contributor

Warning:  Invalid bytecodeVersion for org.glassfish.jersey.core:jersey-common:jar:2.30:compile : META-INF/versions/11/org/glassfish/jersey/internal/jsr166/SubmissionPublisher$1.class: expected 55, but was 56

This library seems to be compiled with Java 12, but Trino works with Java 11

@findinpath findinpath self-requested a review March 31, 2022 13:54
@nastra
Copy link
Copy Markdown
Contributor Author

nastra commented Mar 31, 2022

@findinpath I can see the same warning in other CI runs on other branches. Here's an example from master from https://github.com/trinodb/trino/runs/5763019856?check_suite_focus=true.

Nessie itself is also compiled with Java 11 btw

@nastra nastra force-pushed the add-nessie-support branch from 191715e to 6beef07 Compare March 31, 2022 14:43
@electrum
Copy link
Copy Markdown
Member

The tests should be normal unit tests, not ITs. You can probably use Testcontainers for Nessie.

@findinpath
Copy link
Copy Markdown
Contributor

It would be nice to see some tests which involve dealing with running SQL statements over the query runner to actually test how the connector works with the new catalog type.

Ideal would be (at a later point) to have additionally also a product tests environment (similar to singlenode-spark-iceberg
See for reference:

testing/bin/ptl env describe  --environment EnvSinglenodeSparkIceberg  --config config-default

Comment thread plugin/trino-iceberg/pom.xml Outdated
@findinpath
Copy link
Copy Markdown
Contributor

Thank you @nastra for your contribution. I am looking forward to help out in this PR to add support for Nessie in Trino Iceberg connector.

@nastra nastra force-pushed the add-nessie-support branch from 585fa60 to be8ec7c Compare April 1, 2022 09:43
@nastra
Copy link
Copy Markdown
Contributor Author

nastra commented Apr 1, 2022

@findinpath thanks for the review. I updated the code based on your review and pushed a new commit. Will look into doing some additional testing via SQL statements now.

@findinpath
Copy link
Copy Markdown
Contributor

@nastra you can use the Glue related tests from the connector as a rough template on what kinds of tests you need to add.

@nastra nastra force-pushed the add-nessie-support branch 2 times, most recently from 247f4ee to 9466225 Compare April 1, 2022 14:33
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the client need to know the reference configured for the connector?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes as it needs to operate on the correct branch/tag

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, does the client need to know this detail while it is being created?
Otherwise said, should the catalog deliver the branch/tag name on each call to the client?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventually this is what we might end up having (the catalog providing the branch/tag to the client), but I didn't want to over-complicate the current implementation. That being said, I would prefer to adjust this part of the code once the catalog knows which branch/tag it is being used with then executing a particular SQL

@nastra nastra force-pushed the add-nessie-support branch from 9466225 to 86ce1dc Compare April 6, 2022 10:31
@findinpath
Copy link
Copy Markdown
Contributor

Could you please add documentation on how to setup Nessie catalog in the Iceberg connector

You can use https://github.com/trinodb/trino/pull/11772/files#diff-e1aabf1cfd8bd8aa7d1b75e70089b57413b2e620a5eebeb36cc76fd3f2ac60db as a template for getting started.

Comment thread plugin/trino-iceberg/pom.xml Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, does the client need to know this detail while it is being created?
Otherwise said, should the catalog deliver the branch/tag name on each call to the client?

@bitsondatadev
Copy link
Copy Markdown
Member

The REST catalog is fine and dandy, but I would prefer nessie. Tabular.io's iceberg-rest isn't production.

Hey @Cerebus. Currently, the primary focus is on the JDBC catalog, but let's see if @findepi has any initial comments.

@keithgchapman
Copy link
Copy Markdown

Hi @bitsondatadev , can you help us getting this PR reviewed approved, It's great that the Iceberg REST catalog got merged, as Nessie is an open source project with support in many query engines we would like to add support for Nessie into Trino as well.

@ajantha-bhat
Copy link
Copy Markdown
Member

PR is rebased and ready for the review. Thanks @nastra

@colebow
Copy link
Copy Markdown
Member

colebow commented Feb 15, 2023

@findinpath can you give this another look?

@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Feb 16, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@ajantha-bhat
Copy link
Copy Markdown
Member

@electrum, @findinpath, @ebyhr, @findepi:
Hi, Can we please have another look at this and help in merging this? We are getting quite a few requests for the Trino Iceberg connector supporting the Nessie catalog these days.

@EminUZUN
Copy link
Copy Markdown

EminUZUN commented May 2, 2023

@electrum, @findinpath, @ebyhr, @findepi, @ajantha-bhat:
any news?

@keithgchapman
Copy link
Copy Markdown

Hi Trino community, It's been more than a year since this PR has been opened. All of the review comments have been addressed and it has been rebased many times. Can we get some feedback from the core team as to why this PR is not approved?

The Nessie catalog is a core Iceberg catalog and is implementing the Iceberg Catalog API.

@electrum
Copy link
Copy Markdown
Member

@nastra Can you rebase this, then I'll merge it

@ajantha-bhat
Copy link
Copy Markdown
Member

PR is rebased. Thanks @nastra

@bitsondatadev
Copy link
Copy Markdown
Member

@electrum anything else needed to get this across the finish line?

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented May 22, 2023

I think we should re-review this PR. For instance, iceberg.nessie.ref config property name should be changed to iceberg.nessie-catalog.ref for consistency with other catalogs.

Comment thread docs/src/main/sphinx/connector/iceberg.rst Outdated
Comment thread docs/src/main/sphinx/connector/iceberg.rst Outdated
Comment thread docs/src/main/sphinx/connector/iceberg.rst
Comment thread pom.xml Outdated
Comment thread testing/trino-product-tests/src/main/java/io/trino/tests/product/TestGroups.java Outdated
@ajantha-bhat
Copy link
Copy Markdown
Member

93 successful, 2 skipped, 1 cancelled, and 1 failing checks

io.trino.plugin.hive.TestHiveConnectorTest.testWriterTasksCountLimitPartitionedScaleWritersEnabled Time elapsed: 0.157 s <<< FAILURE!

It seems the failure is unrelated to the new changes and also it is in the hive connector (changes are for the iceberg connector).
This test case could be flaky.

Closing and re-opening the PR to retrigger the build.

@ajantha-bhat
Copy link
Copy Markdown
Member

ajantha-bhat commented May 22, 2023

@ebyhr: Thanks for the review.

Fixed all the comments. PR is ready for review.

Copy link
Copy Markdown
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash commits into one and remove commit body.

Comment thread plugin/trino-iceberg/pom.xml
Comment thread plugin/trino-iceberg/pom.xml Outdated
}

@Test(groups = {ICEBERG, PROFILE_SPECIFIC_TESTS, ICEBERG_REST, ICEBERG_JDBC}, dataProvider = "storageFormatsWithSpecVersion")
@Test(groups = {ICEBERG, PROFILE_SPECIFIC_TESTS, ICEBERG_REST, ICEBERG_JDBC, ICEBERG_NESSIE}, dataProvider = "storageFormatsWithSpecVersion")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests are missing ICEBERG_NESSIE group.

Copy link
Copy Markdown
Contributor Author

@nastra nastra May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea was to just run a few tests with nessie to keep CI times low and to show that trino+spark+nessie work

Comment thread testing/trino-faulttolerant-tests/pom.xml Outdated
Co-authored-by: Ajantha Bhat <ajanthabhat@gmail.com>
@ajantha-bhat
Copy link
Copy Markdown
Member

The failed test TestHiveStorageFormats > testSelectWithNullFormat(0: StorageFormat{name=SEQUENCEFILE, properties={}, sessionProperties={}}) [groups: storage_formats_detailed] is independent of PR changes and could be flaky.

Re-triggering the build

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented May 26, 2023

@ajantha-bhat Please note that you don't need to retrigger CI when the failed test is unrelated to the change. Also, we recommend pushing an empty commit instead of reopening PR when retriggering CI.

@ajantha-bhat
Copy link
Copy Markdown
Member

@ajantha-bhat Please note that you don't need to retrigger CI when the failed test is unrelated to the change. Also, we recommend pushing an empty commit instead of reopening PR when retriggering CI.

Thanks. I am new to Trino contributions. Good to know.

@ajantha-bhat
Copy link
Copy Markdown
Member

PR is ready. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.